-
Notifications
You must be signed in to change notification settings - Fork 24
Add option to generate LM image and GC via two separate jobs #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6b40167 to
b74c654
Compare
|
I just discovered during testing the LM image and GC are set on the post config, not in the normal config. This means splitting the LMGC-Job into two does not change the hash of any existing jobs. Therefore, I think this flag can be enabled by default, or rather, we can in-line the flag and make it the new default! WDYT? To finish testing I need to run a decoding w/ the new GC and new LM, but otherwise the stuff here is now tested. This is what it looks like when the flag is set in a pipeline w/ thousands of jobs (note due to the GC not being hashed only a few GC-jobs are even run): |
|
Apparently switching the default here changes hashes at AppTek. In case it is possible to live w/ the hash breakage (e.g. because it is in unused parts of the code) I'd like the flag to be on by default so as many folks as possible can profit from the changes here. If the hash breakage is unacceptable we leave it off by default of course. |
*Any existing search jobs. The graph however is changed as in all of the LMGC Jobs are now removed and separate LM and GC Jobs are added. This is caught in the pipeline.
Unfortunately all parts that are tested in the pipeline are used. |
|
I have successfully run recognitions w/ this setup. This is tested now. |
|
@curufinwe @michelwi can we merge this? |
Since the apptek test passes, I see no objections from our site. I'll dismiss my old review, but I currently don't have much time to re-review it.. sorry. |
recognition/advanced_tree_search.py
Outdated
| ) | ||
| for i, lm_config in enumerate(arpa_lms): | ||
| lm_config[1].image = lm_gc.out_lm_images[i + 1] | ||
| for i, (_lm_config, lm_post_config) in enumerate(arpa_lms): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
With two approves yes :). I just got a few comments then I can approve, but please ask someone else to also review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think many open comments should be addressed before merging.
|
is anyone still interested in having this merged or has this Job become irrelevant at the chair? |
I am interested, but not willing to spend time on it myself. |
Co-authored-by: michelwi <[email protected]> Co-authored-by: Benedikt Hilmes <[email protected]>
…i6_core into feat/separate-lmi-gc-generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all open comments are addressed and the pipeline is green.
@Atticus1806 , @JackTemaki, @NeoLegends would you be able to take another look? Thx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit, otherwise LGTM! :)
recognition/advanced_tree_search.py
Outdated
| :param lmgc_alias: Alias for the AdvancedTreeSearchLmImageAndGlobalCacheJob | ||
| :param lmgc_scorer: Dummy scorer for the AdvancedTreeSearchLmImageAndGlobalCacheJob which is required but unused | ||
| :param lm_cache_method: Specifies, how the LM image and the global cache should be created: | ||
| JOINED (default) -> automatically create lm images and global cache as output of one job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: consider informing about the tradeoff between the different options here, i.e. that the default has hash issues.
|
I cannot approve since I originally created this PR; but LGTM! |
Closes #430
Closes #514
Now testing...